Bioinformatics An Introduction 4th Edition (Jeremy Ramsden)

356

Medicine and Disease

Much of the business of bioinformatics concerns the correlation of phenotype with

genotype, with the transcriptome and proteome acting as intermediaries. ⁴Bioin-

formatics gives an unprecedented ability to scrutinize the intermediate levels and

establish correlations far more extensively and in far more detail than was ever pos-

sible before the advent of high-throughput sequencing and other omics technologies,

along with the computing power enabling the handling (including storage) and anal-

ysis of huge datasets. This ability is revolutionizing medicine. In this spirit, one may

represent the human being as a gigantic table of correlations, comprising successive

columns of genes and genetic variation, environmental conditions—the exposome,

protein levels, and physiological states and interactions. ⁵

Medicine is especially concerned with investigating physiological disorders, and

the techniques of bioinformatics allows one to establish correlations between those

disorders and variations in the genome and proteome of a patient; ⁶medical appli-

cations of bioinformatics are often concerned with the investigation of deleterious

genetic variation and with abnormal protein expression patterns.

More and more data on the genotype of individuals are being gathered. Millions of

single-nucleotide polymorphisms (SNPs) are now documented, and studies involv-

ing the genotyping of hundreds of SNPs in thousands of people are now feasible. ⁷

As pointed out earlier (Sect. 14.4.3), most of the genetic variability across human

populations can be accounted for by SNPs, and most of the SNP variation can be

grouped into a small number of haplotypes. ⁸This growing database might be useful

for elucidating the genetic basis of disease, or susceptibility to disease, and hence

preventive treatment for those screened routinely. This does, however, raise the eth-

ical difﬁculties associated with prevention, which is not properly part of medicine

(Ramsden 2021). The use of genetic information is further discussed in Sect. 26.3.

The wish to develop genetic screening implies a need for a much more rapid and

inexpensive way of screening for mutations than is possible with genome sequencing.

The classic method is to digest the gene with restriction enzymes and analyse the

fragments separated chromatographically using Southern blotting (see footnote 2 in

Chap. 18). Although direct genotyping with allele-speciﬁc hybridization is possible

in simple genomes (e.g., yeast), the complexity of the human genome renders this

4 Indeed, one could view the organism as a gigantic hidden Markov model (Sect. 17.5.2), in which

the gene controls switching between physiological states via protein expression. Unlike the simpler

models considered earlier, here the outputs could intervene in hidden layers.

5 Since the physiological column includes entries for neurophysiological states, it might be tempt-

ing to continue the table by adding a column for the conscious experiences corresponding to the

physiological and other entries. One must be careful to note, however, that conscious experience

is in a different category from the entries in the columns that precede it (Ramsden 2001). Hence,

correlation cannot be taken to imply identity (in the same way, a quadratic equation with two roots

derived by a piece of electronic hardware is embodied in the hardware, but it makes no sense to say

that the hardware has two roots, despite the fact that those roots have well-deﬁned correlates in the

electronic states of the circuit components).

6 Mossink et al. (2012).

7 These data can also be used to infer population structures (Jakobsson et al. 2008).

8 These investigations are closely related to those of linkage disequilibrium (nonrandom association

between alleles at different loci).